http://www.abbs.info E-mail:[email protected]
ISSN 0582-9879 Acta Biochim et Biophysica Sinica 2004, 36(1):016-020 CN
31-1300/Q
DNAskew: Statistical Analysis of Base
Compositional Asymmetry and Prediction of Replication Boundaries in the Genome
Sequences
Xiang-Ru MA, Shao-Bo XIAO, Ai-Zhen GUO, Jian-Qiang LÜ, and Huan-Chun CHEN*
Laboratory
of Animal Virology, College of Veterinary Medicine, Huazhong Agricultural
University, Wuhan 430070, China
Abstract Sueoka and Lobry declared
respectively that, in the absence of bias between the two DNA strands for
mutation and selection, the base composition within each strand should be A=T
and C=G (this state is called Parity Rule type 2, PR2). However, the genome
sequences of many bacteria, vertebrates and viruses showed asymmetries in base
composition and gene direction. To determine the relationship of base
composition skews with replication orientation, gene function, codon usage
biases and phylogenetic evolution, in this paper a program called DNAskew was
developed for the statistical analysis of strand asymmetry and codon
composition bias in the DNA sequence. In addition, the program can also be used
to predict the replication boundaries of genome sequences. The method builds on
the fact that there are compositional asymmetries between the leading and the
lagging strand for replication. DNAskew was written in Perl script language and
implemented on the LINUX operating system. It works quickly with annotated or
unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source
code is freely available for academic use at
http://www.epizooty.com/pub/stat/DNAskew.
Key words strand asymmetry; base composition; statistics
analysis; replication origin; bioinformatics
-----------------
Received: July 8, 2003 Accepted: October 29, 2003
This work was supported by a grant from the National High Technology
Research and Development Program of China (863 Program) (No. 2001AA213051)
*Corresponding author: Tel, 86-27-87282608; E-mail, [email protected]